Goto

Collaborating Authors

 skill vector


ContrastiveIntrinsicControlforUnsupervised ReinforcementLearning

Neural Information Processing Systems

Unlikeknowledge-based anddata-basedalgorithms, competence-based algorithms simultaneously address both the exploration challenge as well as distilling the generated experience in the form of reusable skills.



Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Neural Information Processing Systems

We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.



A Additional Implementation Details

Neural Information Processing Systems

These hyperparameters are fixed throughout all domains. Tab. 1 details the hyper-parameters used in MOSS which are taken directly from We include the environment renders in Figure?? . 1 Table 2: Hyperparameters for MOSS and DQN. These hyperparameters are fixed throughout all domains. Action repeat 1 Frame repeat 12 Seed frames 4000 n-step returns 3 Mini-batch size 1048 Discount ( γ) 0.99 Optimizer Adam Learning rate 0.0001 Agent update frequency 2 Critic target EMA rate ( τ We made modifications to MOSS to evaluate in discrete action settings. Tab. 2 details the hyper-parameters used for Double DQN and MOSS in the ViZDoom environment.


SPECI: Skill Prompts based Hierarchical Continual Imitation Learning for Robot Manipulation

Xu, Jingkai, Nie, Xiangli

arXiv.org Artificial Intelligence

Real-world robot manipulation in dynamic unstructured environments requires lifelong adaptability to evolving objects, scenes and tasks. Traditional imitation learning relies on static training paradigms, which are ill-suited for lifelong adaptation. Although Continual Imitation Learnin (CIL) enables incremental task adaptation while preserving learned knowledge, current CIL methods primarily overlook the intrinsic skill characteristics of robot manipulation or depend on manually defined and rigid skills, leading to suboptimal cross-task knowledge transfer. To address these issues, we propose Skill Prompts-based HiErarchical Continual Imitation Learning (SPECI), a novel end-to-end hierarchical CIL policy architecture for robot manipulation. The SPECI framework consists of a multimodal perception and fusion module for heterogeneous sensory information encoding, a high-level skill inference module for dynamic skill extraction and selection, and a low-level action execution module for precise action generation. To enable efficient knowledge transfer on both skill and task levels, SPECI performs continual implicit skill acquisition and reuse via an expandable skill codebook and an attention-driven skill selection mechanism. Furthermore, we introduce mode approximation to augment the last two modules with task-specific and task-sharing parameters, thereby enhancing task-level knowledge transfer. Extensive experiments on diverse manipulation task suites demonstrate that SPECI consistently outperforms state-of-the-art CIL methods across all evaluated metrics, revealing exceptional bidirectional knowledge transfer and superior overall performance.

  Country: Asia > China > Beijing > Beijing (0.04)
  Genre: Research Report (1.00)
  Industry: Education (1.00)

Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Neural Information Processing Systems

We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.


Integrating Functionalities To A System Via Autoencoder Hippocampus Network

Luo, Siwei

arXiv.org Artificial Intelligence

Integrating multiple functionalities into a system poses a fascinating challenge to the field of deep learning. While the precise mechanisms by which the brain encodes and decodes information, and learns diverse skills, remain elusive, memorization undoubtedly plays a pivotal role in this process. In this article, we delve into the implementation and application of an autoencoder-inspired hippocampus network in a multi-functional system. We propose an autoencoder-based memorization method for policy function's parameters. Specifically, the encoder of the autoencoder maps policy function's parameters to a skill vector, while the decoder retrieves the parameters via this skill vector. The policy function is dynamically adjusted tailored to corresponding tasks. Henceforth, a skill vectors graph neural network is employed to represent the homeomorphic topological structure of subtasks and manage subtasks execution.


Computational Teaching for Driving via Multi-Task Imitation Learning

Gopinath, Deepak, Cui, Xiongyi, DeCastro, Jonathan, Sumner, Emily, Costa, Jean, Yasuda, Hiroshi, Morgan, Allison, Dees, Laporsha, Chau, Sheryl, Leonard, John, Chen, Tiffany, Rosman, Guy, Balachandran, Avinash

arXiv.org Artificial Intelligence

Driving is a sensorimotor task that is done often, and requires a degree of competency that has to be taught. While daily driving is complex and safety critical, performance driving requires a higher degree of competency in handling the vehicle at high speeds and limits of stability and requires years of one-on-one instruction and practice to master. Although driving instructors can help drivers perform better and safer [1], their availability is limited and costly. Hence, there is a clear need for automated teaching which can help drivers improve at the population scale. Driving instructors, e.g. in performance track driving [2], rely on their expertise in the driving task and their inference of student's skill levels to effectively teach students of various skill levels and learning styles. Instructors can gauge their students' skill levels and estimate what a student might do in a given scenario to provide contextually-relevant verbal instructions to the student. For example, consider how an instructor in the passenger seat might instruct a student driver on the appropriate timing for braking or the lateral positioning of the car with respect to the racing line (the optimal minimum time path around a race course). The teacher's ability to judge whether the student can maintain the racing line or oversteer in a turn influences what instructions are provided. An automated teaching system for driving should be able to take in relevant vehicle context (pose and dynamics, map information, etc.) and other factors (eg., driver monitoring) as inputs and output appropriate teaching actions for the


Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

Karvonen, Adam

arXiv.org Artificial Intelligence

Language models have shown unprecedented capabilities, sparking debate over the source of their performance. Is it merely the outcome of learning syntactic patterns and surface level statistics, or do they extract semantics and a world model from the text? Prior work by Li et al. investigated this by training a GPT model on synthetic, randomly generated Othello games and found that the model learned an internal representation of the board state. We extend this work into the more complex domain of chess, training on real games and investigating our model's internal representations using linear probes and contrastive activations. The model is given no a priori knowledge of the game and is solely trained on next character prediction, yet we find evidence of internal representations of board state. We validate these internal representations by using them to make interventions on the model's activations and edit its internal board state. Unlike Li et al's prior synthetic dataset approach, our analysis finds that the model also learns to estimate latent variables like player skill to better predict the next character. We derive a player skill vector and add it to the model, improving the model's win rate by up to 2.6 times.